A general, prediction error-based criterion for selecting model complexity for high-dimensional survival models.

نویسندگان

  • Christine Porzelius
  • Martin Schumacher
  • Harald Binder
چکیده

When fitting predictive survival models to high-dimensional data, an adequate criterion for selecting model complexity is needed to avoid overfitting. The complexity parameter is typically selected by the predictive partial log-likelihood (PLL) estimated via cross-validation. As an alternative criterion, we propose a relative version of the integrated prediction error curve (IPEC), which can be stably estimated via bootstrap resampling. The IPEC has the advantage of being applicable for models and fitting techniques where the PLL is not available. To investigate the performance of this new criterion, a simulation study is carried out, mimicking microarray survival data. Additionally, model selection by predictive PLL, estimated via bootstrap resampling instead of cross-validation, is examined. It is seen that this mostly results in similar prediction performance of the selected models, compared to estimates based on cross-validation. Model selection by bootstrap estimates of the IPEC performs about as well as selection by cross-validation estimates of the PLL. Therefore, it is expected to be a reasonable alternative in cases where there is no PLL. Similar results are seen in the analysis of a microarray survival data set from patients with diffuse large-B-cell lymphoma.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New High-order Takagi-Sugeno Fuzzy Model Based on Deformed Linear Models

Amongst possible choices for identifying complicated processes for prediction, simulation, and approximation applications, high-order Takagi-Sugeno (TS) fuzzy models are fitting tools. Although they can construct models with rather high complexity, they are not as interpretable as first-order TS fuzzy models. In this paper, we first propose to use Deformed Linear Models (DLMs) in consequence pa...

متن کامل

Prediction of ultimate strength of shale using artificial neural network

A rock failure criterion is very important for prediction of the ultimate strength in rock mechanics and geotechnics; it is determined for rock mechanics studies in mining, civil, and oil wellborn drilling operations. Also shales are among the most difficult to treat formations. Therefore, in this research work, using the artificial neural network (ANN), a model was built to predict the ultimat...

متن کامل

An unbiased Cp criterion for multivariate ridge regression

Mallows’ Cp statistic is widely used for selecting multivariate linear regression models. It can be considered to be an estimator of a risk function based on an expected standardized mean square error of prediction. Fujikoshi and Satoh (1997) have proposed an unbiased Cp criterion (called modified Cp; MCp) for selecting multivariate linear regression models. In this paper, the unbiased Cp crite...

متن کامل

مقایسه مدل های غیرخطی برای توصیف منحنی رشد از تولد تا یکسالگی در بز مرخز

The objective of this study was to select the best model among five non-linear growth functions, i.e., Brody, Gompertz, Logistic, Von Bertalanffy and Negative exponential for describing the growth curve in Markhoz goat. The data included 5557 body weight records of goats from birth to yearling which were collected during 2006 to 2013 at Sanandaj Research Station. Growth curve parameters (A, B, ...

متن کامل

Ensemble Kernel Learning Model for Prediction of Time Series Based on the Support Vector Regression and Meta Heuristic Search

In this paper, a method for predicting time series is presented. Time series prediction is a process which predicted future system values based on information obtained from past and present data points. Time series prediction models are widely used in various fields of engineering, economics, etc. The main purpose of using different models for time series prediction is to make the forecast with...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Statistics in medicine

دوره 29 7-8  شماره 

صفحات  -

تاریخ انتشار 2010